Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs.

نویسندگان

  • M Remm
  • E Sonnhammer
چکیده

The complete genome sequence of the nematode Caenorhabditis elegans provides an excellent basis for studying the distribution and evolution of protein families in higher eukaryotes. Three fundamental questions are as follows: How many paralog clusters exist in one species, how many of these are shared with other species, and how many proteins can be assigned a functional counterpart in other species? We have addressed these questions in a detailed study of predicted membrane proteins in C. elegans and their mammalian homologs. All worm proteins predicted to contain at least two transmembrane segments were clustered on the basis of sequence similarity. This resulted in 189 groups with two or more sequences, containing, in total, 2647 worm proteins. Hidden Markov models (HMMs) were created for each family, and were used to retrieve mammalian homologs from the SWISSPROT, TREMBL, and VTS databases. About one-half of these clusters had mammalian homologs. Putative worm-mammalian orthologs were extracted by use of nine different phylogenetic methods and BLAST. Eight clusters initially thought to be worm-specific were assigned mammalian homologs after searching EST and genomic sequences. A compilation of 174 orthology assignments made with high confidence is presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss.

The str family of genes encoding seven-transmembrane G-protein-coupled or serpentine receptors related to the ODR-10 diacetyl chemoreceptor is very large, with at least 197 members in the Caenorhabditis elegans genome. The closely related stl family has 43 genes, and both families are distantly related to the srd family with 55 genes. Analysis of the structures of these genes indicates that a t...

متن کامل

Family Size and Turnover Rates among Several Classes of Small Non–Protein-Coding RNA Genes in Caenorhabditis Nematodes

It is important to understand the forces that shape the size and evolutionary histories of gene families. Here, we investigated the evolution of non-protein-coding RNA genes in the genomes of Caenorhabditis nematodes. We specifically focused on nested arrangements, that is, cases in which an RNA gene is entirely contained in an intron of another gene. Comparing these arrangements between specie...

متن کامل

Genomic classification of protein-coding gene families.

This chapter reviews analytical tools currently in use for protein classification, and gives an overview of the C. elegans proteome. Computational analysis of proteins relies heavily on hidden Markov models of protein families. Proteins can also be classified by predicted secondary or tertiary structures, hydrophobic profiles, compositional biases, or size ranges. Strictly orthologous protein f...

متن کامل

The taxonomy of developmental control in Caenorhabditis elegans.

The Caenorhabditis elegans genome sequence was surveyed for transcription factor and signaling gene families that have been shown to regulate development in a variety of species. About 10 to 25 percent of the genes in most of the gene families already have been genetically analyzed in C. elegans, about half of the genes detect probable orthologs in other species, and about 10 to 25 percent of t...

متن کامل

Analysis of protein domain families in Caenorhabditis elegans.

The Caenorhabditis elegans genome sequencing project has completed over half of this nematode's 100-Mb genome. Proteins predicted in the finished sequence have been compiled and released in the data-base Wormpep. Presented here is a comprehensive analysis of protein domain families in Wormpep 11, which comprises 7299 proteins. The relative abundance of common protein domain families was counted...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 10 11  شماره 

صفحات  -

تاریخ انتشار 2000